Use of Bayesian Network in Information Extraction from Unstructured Data Sources

نویسنده

Quratulain N. Rajput

چکیده

This paper applies Bayesian Networks to support information extraction from unstructured, ungrammatical, and incoherent data sources for semantic annotation. A tool has been developed that combines ontologies, machine learning, and information extraction and probabilistic reasoning techniques to support the extraction process. Data acquisition is performed with the aid of knowledge specified in the form of ontology. Due to the variable size of information available on different data sources, it is often the case that the extracted data contains missing values for certain variables of interest. It is desirable in such situations to predict the missing values. The methodology, presented in this paper, first learns a Bayesian network from the training data and then uses it to predict missing data and to resolve conflicts. Experiments have been conducted to analyze the performance of the presented methodology. The results look promising as the methodology achieves high degree of precision and recall for information extraction and reasonably good accuracy for predicting missing values. Keywords—Information Extraction, Bayesian Network, ontology, Machine Learning

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Note on Evolutionary Rate Estimation in Bayesian Evolutionary Analysis: Focus on Pathogens

Bayesian evolutionary analysis provide a statistically sound and flexible framework for estimation of evolutionary parameters. In this method, posterior estimates of evolutionary rate (μ) are derived by combining evolutionary information in the data with researcher’s prior knowledge about the true value of μ. Nucleotide sequence samples of fast evolving pathogens that are taken at d...

متن کامل

A Reference-set Approach to Information Extraction from Unstructured, Ungrammatical Data Sources

This thesis investigates information extraction from unstructured, ungrammatical text on the Web such as classified ads, auction listings, and forum postings. Since the data is unstructured and ungrammatical, this information extraction precludes the use of rule-based methods that rely on consistent structures within the text or natural language processing techniques that rely on grammar. Inste...

متن کامل

A Comparison of Two Ontology-Based Semantic Annotation Frameworks

The paper compares two semantic annotation frameworks that are designed for unstructured and ungrammatical domains. Both frameworks, namely ontoX (ontology-driven information Extraction) and BNOSA (Bayesian network and ontology based semantic annotation), extensively use ontologies during knowledge building, rule generation and data extraction phases. Both of them claim to be scalable as they a...

متن کامل

An Integrative Approach to Information Extraction

Huge amount of information is hidden within unstructured text. This information is often best exploited in structured or relational form, which is suited for many applications including Information Extraction. Information Extraction is the task of automatically extracting structured information from a given set of information thus producing a well-defined categorized data from unstructured mach...

متن کامل

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Use of Bayesian Network in Information Extraction from Unstructured Data Sources

نویسنده

چکیده

منابع مشابه

A Note on Evolutionary Rate Estimation in Bayesian Evolutionary Analysis: Focus on Pathogens

A Reference-set Approach to Information Extraction from Unstructured, Ungrammatical Data Sources

A Comparison of Two Ontology-Based Semantic Annotation Frameworks

An Integrative Approach to Information Extraction

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

عنوان ژورنال:

اشتراک گذاری